Performance Limits Due to Inter-Cluster Data Forwarding in Wire-Limited ILP Microprocessors
نویسندگان
چکیده
The growing speed gap between transistors and wire interconnects is forcing the development of distributed, or clustered, architectures. These designs partition the chip into small regions with fast intra-cluster communication. Longer latency is required to communicate between clusters. The hardware and/or software is responsible for scheduling instructions to clusters such that critical path communication occurs within a cluster. This paper explores fundamental interactions between semiconductor technology and clustered architectures. The relationship between key technology parameters (inter-cluster wire delay and transistor switching delay) and key architecture parameters (superscalar vs multithreaded instruction dispatch, and value prediction support) is investigated. The GENESYS modeling tool is used to predict inter-cluster latencies as VLSI technology advances. The study shows that performance limits of the conventional superscalar approach are substantially higher with zero-delay wires. As wire delay increases, performance of these designs degrade quickly. Threaded designs are more tolerant to wire delay. It is seen that the optimal thread size changes with advancing VLSI technology, suggesting a highly adaptive architecture. Value prediction is shown to be useful in all cases, but provides more benefit to the multi-threaded design.
منابع مشابه
A wire delay-tolerant reconfigurable unit for a clustered programmable-reconfigurable processor
Wire delay is rapidly becoming a major bottleneck in reconfigurable systems, creating a significant gap between the clock rates of reconfigurable logic and custom circuits. In this paper, we describe the design of the reconfigurable clusters on the Amalgam clustered programmable-reconfigurable processor. Amalgam’s reconfigurable clusters are divided into four segments of reconfigurable logic, l...
متن کاملThe Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors
ÐCurrent microprocessors incorporate techniques to aggressively exploit instruction-level parallelism (ILP). This paper evaluates the impact of such processors on the performance of shared-memory multiprocessors, both without and with the latencyhiding optimization of software prefetching. Our results show that, while ILP techniques substantially reduce CPU time in multiprocessors, they are les...
متن کاملArchitectural support for thread communications in multi-core processors
In the ongoing quest for greater computational power, efficiently exploiting parallelism is of paramount importance. Architectural trends have shifted from improving singlethreaded application performance, often achieved through instruction level parallelism (ILP), to improving multithreaded application performance by supporting thread level parallelism (TLP). Thus, multi-core processors incorp...
متن کاملEnergy-Aware Probabilistic Epidemic Forwarding Method in Heterogeneous Delay Tolerant Networks
Due to the increasing use of wireless communications, infrastructure-less networks such as Delay Tolerant Networks (DTNs) should be highly considered. DTN is most suitable where there is an intermittent connection between communicating nodes such as wireless mobile ad hoc network nodes. In general, a message sending node in DTN copies the message and transmits it to nodes which it encounters. A...
متن کاملThe Performance Impact of Exploiting Branch ILP with Tree Representation of ILP Code
Modern single-CPU microprocessors exploit instruction-level parallelism (ILP) by deriving their performance advantage mainly from parallel execution of ALU and memory instructions within a single clock cycle. This performance advantage obtained by exploiting data ILP is severely offset by sequential execution of conditional branches, especially in branch-intensive non-numerical code. Consequent...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000